Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix limit check on variables with double accuracy #1047

Closed
wants to merge 2 commits into from

Conversation

McCodie
Copy link
Contributor

@McCodie McCodie commented Jan 17, 2021

This fixes a bug which avoided to be able to drive to the Z limit after a tool change via G-Code in Axis.

I created an issue:
#1046

@andypugh
Copy link
Collaborator

Can you give a little more information about this? This code has been stable since 2013, so I think we need some sort of evidence to change it.
Note, also, that the preceding sections use "eps" as a notional small amount. Should your addition use the same?

@andypugh
Copy link
Collaborator

@dngarrett : You were in here last (in 2017). Do you have any thoughts?

@McCodie
Copy link
Contributor Author

McCodie commented Jan 18, 2021

I have a really small mill, once I put a rotary table on it I'm really using all the space on it.
After adding tool changing support I cannot issue a G53 Z80 anymore (80 is the limit!), I added some debugging and it said that 80.000000 is exceeding the limit 80.000000, yes since double has an accuracy of around 16 digits.

After applying this patch my tool changing g-code is workable, and the machine retracts to 80mm; it will complain if it's 80.1mm (as it is supposed to be), but 80 itself is ok.

@McCodie
Copy link
Contributor Author

McCodie commented Jan 18, 2021

As a reference.
https://www.exploringbinary.com/decimal-precision-of-binary-floating-point-numbers/

we're dealing with 10^0-10^2 or 3 so the accuracy should be around 15 for double.

Within the last 10 years I only had to deal with such high precision once as far as I remember. I guess most programmers would have to read up about that topic to get into it again.

@phillc54
Copy link
Collaborator

This may the reason I have always set my max limit 0.001mm more than what I wanted and my min limit 0.001mm less than what I wanted so I could avoid that problem.

@dngarrett
Copy link
Collaborator

The eps setting from the 2017 commit:
38bd452 "motion/command.c: avoid spurious error"
addressed a special case for which the
eps=1e-308 could indeed be very small.

The pull request's suggested tests seem a little awkward.
Simpler forms like:
(target > (pl - some_small_value))
(target < (nl + some_small_value))
would be more readable to me.

(i have not been able to reproduce the problem in a sim
config -- posting a complete config would be helpful)

@McCodie
Copy link
Contributor Author

McCodie commented Jan 18, 2021

Well you know the bug feel free to fix it :-) I'm out here.

Read the double documentation again and try to understand it next time.
thanks.

@McCodie
Copy link
Contributor Author

McCodie commented Jan 20, 2021

please also fix inRange in command.c same story here.

I guess the entire emc/linuxcnc project has to be checked for double accuracy issues....

/* inRange() returns non-zero if the position lies within the joint
limits, or 0 if not. It also reports an error for each joint limit
violation. It's possible to get more than one violation per move. */

STATIC int inRange(EmcPose pos, int id, char *move_type)
...
if (joint_pos[joint_num] > joint->max_pos_limit && fabs(joint->max_pos_limit-joint_pos[joint_num])>10e-14) {
in_range = 0;
reportError(_("%s move on line %d would exceed joint %d's positive limit (%f / %f)"),
move_type, id, joint_num, joint_pos[joint_num], joint->max_pos_limit);
}

   if (joint_pos[joint_num] < joint->min_pos_limit && fabs(joint->min_pos_limit-joint_pos[joint_num])>10e-14) {
       in_range = 0;
       reportError(_("%s move on line %d would exceed joint %d's negative limit (%f / %f)"),
                   move_type, id, joint_num, joint_pos[joint_num], joint->min_pos_limit);
   }

sometimes it works sometimes it doesn't, depending on the value determined by the tool setter code.

@McCodie
Copy link
Contributor Author

McCodie commented Jan 20, 2021

.... next one:
control.c static void get_pos_cmds(long period)
/* check for soft limits */
if (joint->pos_cmd > joint->max_pos_limit) {
joint_limit[joint_num][1] = 1;
onlimit = 1;
reportError(("Exceeded POSITIVE soft limit (%.20f) on joint %d %.20f\n"),
joint->max_pos_limit, joint_num, joint->pos_cmd);
}
else if (joint->pos_cmd < joint->min_pos_limit) {
joint_limit[joint_num][0] = 1;
onlimit = 1;
reportError(
("Exceeded NEGATIVE soft limit (%.20f) on joint %d\n %.20f\n"),
joint->min_pos_limit, joint_num, joint->pos_cmd);
}

A screenshot of the output:
https://imgur.com/a/UgWBJRK

@andypugh
Copy link
Collaborator

I don't understand this. Whilst I am clear on the fact that double does not have an exact representation for every real number, it should still be the case that < and > should work for comparing two doubles set to the same real value.

In the screenshot the axis position is outside the axis limits, albeit by a tiny amount.

I would tend to the opinion that the problem lies with the code which is calculating these out-of-limit values, not with the bounds checking itself.

@McCodie
Copy link
Contributor Author

McCodie commented Jan 23, 2021

certainly the double inaccuracy is also within the calculation too, but the low values shouldn't be used for the check at all either.

G53 G01 Z80 Fails and the result is that picture (and other errors, this discussion already shows 3 occurrences in the code where it's an issue). I can go to Z79.9999999 via G-Code, but not to Z80.000

However jogging to Z80 via PAGEUP was no problem.

I've been told that some people tend to use a limit of Z80.001 because they thought that the limit is exclusive the value given in the configuration file, no it is inclusive but the double accuracy issue avoids that. It has been like that for many years it seems.

I'm using the updated version since I reported it and did not experience any issue anymore.
Basically I have added toolchanger support myself including a test option to drive to a tool setter gauge after the change to test the accuracy.
And if there's a rotary head installed on my system I need to retract as far as possible, a bigger milling machine possibly doesn't have to due to more space.

@andypugh
Copy link
Collaborator

OK, I agree that if the G-code says "80" and the INI says "80" then there should not be a problem. But I am puzzled as to why the two places would interpret the string "80" as two different values. As far as I know the G-code interpreter and HAL both use double-precision throughout.
Furthermore, 80.0 is exactly represented even in single-precision: https://www.h-schmidt.net/FloatConverter/IEEE754.html
So if G53 G0 Z80 is actually being interpreted as G0 Z80.00000xxxx then something other than just double-precision inexactitude is happening.

@McCodie
Copy link
Contributor Author

McCodie commented Jan 23, 2021

I understand what you mean, double calculations have to be checked in the entire linuxcnc project. I see many if ... > or < checks, but nothing that indicates that the double precision ever has been taken care about.
Possibly the limits are the only place where it really matters.
All other rounding/accuracy issues are too small to cause problems in the real world.

The procedure to get to that was:

  1. Home machine
  2. possibly do some free jogging
  3. issue m6 tN, the command is rewritten, and still works with Z80 at that point
  4. after touching the z probe G53 .. Z80 doesn't work anymore due to the double inaccuracy

So either there's a memory corruption or cmd_pos will be recalculated from that point on, but that should not affect movements in G53 (machine coordinate system)

@jepler
Copy link
Member

jepler commented Jan 27, 2021

Thanks @McCodie for your efforts to improve LinuxCNC. @andypugh asked me to drop in and give my opinion.

In floating point, there are many numbers A and B for which the usual identity (A+B)-B == A does NOT hold. We do calculations of those types (e.g., adding a tool offset and later subtracting it off; same with G5x and G92 offsets, home switch offsets, etc!) probaby in many places across the code. For instance (0.1 + 0.2) - 0.2 is one such example. (.3 + 80.0) - 80.0 is another. My suspicion is that this is waves hands what's going on, with one of the numbers being an applied tool length, but I can't tell for certain from the description available.

(There are also sometimes surprising units conversions in the software -- I forget if setting LINEAR_UNITS = mm actually forces use of metric units EVERYWHERE or if we still hold to the practice of converting to & from inches somewhere within task & interpreter. It would surprise me a bit if we'd rooted all that out...)

I'm not categorically against such changes, but I'm also not against better documenting the advised practice of setting the soft limits just outside the intended usable volume of the machine, because @McCodie is almost certainly right that this is not the only instance where attempting to go "exactly to" a limit will not work out consistently depending on how offsets combine & cancel. (note: I took a search through the docs and didn't find where it's clearly spelled out, though from the comments in this thread -- thanks @phillc54 ! -- it's clear that this is at least "folk knowledge" among some users)

Another reason for setting soft limits slightly outside of the "intended limits" is feedback position: Both with servo feedback & with steppers there are (A) another set of scalings & roundings and (B) the possibility that either the physical servo OR the stepper velocity ramp will lead to a move INTENDED to terminate exactly at the "limit" be reported as going some counts beyond it.

(also, this folk knowledge is NOT followed in sim/axis.ini but it is in stepconf, but only if the home position is right up against the soft limit:

        # linuxcnc doesn't like having home right on an end of travel,
        # so extend the travel limit by up to .01in or .1mm
        minlim = get("minlim")
        maxlim = get("maxlim")
        home = get("homepos")
        if self.d.units: extend = .001
        else: extend = .01

(also I think the comment's wrong, it must be .001in or .01mm...)
)

@andypugh
Copy link
Collaborator

So, it seems that I was wrong in my assumptions, and jepler thinks that you are on to something.
The extra tweak in the wizards seems like a kludge (ie, it means that you can actually jog out of the region that you configured)

You definitely should be able to move to a point that displays the number that you configured as the limit.

I feel that your condition:
if (joint_pos[joint_num] < joint->min_pos_limit && fabs(joint->min_pos_limit-joint_pos[joint_num])>10e-14)

Could be re-stated as

if (joint_pos[joint_num] < joint->min_pos_limit - 10e-14)

And the same with a + for the positive limit.

I wonder what the worst-case error is, and if it can ever accumulate?

@andypugh
Copy link
Collaborator

I have rebased this against 2.8 (as a bug fix) and pushed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants